Fault Tolerance via Replication in Coarse Grain Data-Flow1
نویسندگان
چکیده
Recent advances in network technology promise to make gigabit-per-second bandwidth between remote hosts a reality in the near future. This increase in bandwidth paves the way for increased exploitation of distributed computing resources. Coupled with advances in distributed memory parallel compiler technology, there is strong reason to believe that wide-area distributed parallel processing will be an increasingly popular and important programming paradigm. Parallelizing and distributing program sub-tasks has the potential to increase performance for many applications while also improving the overall utilization of system resources. Unfortunately, there is a downside. When a program is partitioned into sub-tasks, each sub-task is distributed to potentially a different processor. As the number of processors employed by an application increases so does the chance that the application will fail due to a host/ processor failure.
منابع مشابه
Fault Tolerance via Replication in Coarse Grain Data-Flow
Recent advances in network technology promise to make gigabit-per-second bandwidth between remote hosts a reality in the near future. This increase in bandwidth paves the way for increased exploitation of distributed computing resources. Coupled with advances in distributed memory parallel compiler technology, there is strong reason to believe that wide-area distributed parallel processing will...
متن کاملAR-SMT: Coarse-Grain Time Redundancy for High Performance General Purpose Processors
Time redundancy is a fault tolerance technique in which a task -either computation or communication -is performed multiple times on the same hardware. This technique is cheaper than other fault tolerance solutions that require some form of hardware redundancy, because it does not require replicated hardware. However, fault coverage may be lower with time redundancy as it only captures certain c...
متن کاملFault Tolerant Wide-Area Parallel Computing
Executing parallel applications across distributed networks introduces the problem of fault tolerance. A viable solution for fault tolerance must keep overhead manageable and not compromise the high performance objective of parallel processing. In this paper, we explore two options for achieving fault tolerance for a common class of parallel applications, single-program-multiple-data (SPMD). We...
متن کاملRAIDb: Redundant Array of Inexpensive Databases
Clusters of workstations become more and more popular to power data server applications such as large scale Web sites or e-Commerce applications. There has been much research on scaling the front tiers (web servers and application servers) using clusters, but databases usually remain on large dedicated SMP machines. In this paper, we address database performance scalability and high availabilit...
متن کاملRobustness Analysis of Distributed Databases via Fault Injection
As the importance of reliable data storage increases, various techniques are applied to obtain the reliabilities. The usage of the assumed fault tolerant cloud providers increases, even though not a sufficient amount of research is performed on their fault tolerance. On cloud servers, a distributed Database Management System (DBMS) can be used to achieve safe data storage. Various DBMSs are mak...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1996